Inver.^on .*cart site 



860 



901 



961 



AT TATA AAGGAAAAAGAAAATAAC GCAAT GGACAAG TGGTG 

+ + + + 900 

TAATATTTCCTTTTTCTTTTATTGCGTTACCTGTTCACCAC (41) 
YKGKRK* RNGQVV 



AAGCT GTGAACT CAGGT GTGCACAATTATCAGGAACA 3CCCAAAACCAAAGTGAGGTAGA 

TTCGACACTTGAGTCCACACGTGTTAATAGTCCTTGTGGGGTTTTGGTTTCACTCCATCT (101) 
KL*TQVCTIIRNTPKPK*GR 



AATAGCATGAGAAGC CGT GTTTGATGTTAATTAATT 
+ + + 995 

TTAT C GTACT G T T C GGCACAAACTACAAT TAAT TAA ( 137 ) 
NSMRSRV*C*LI 



The inversion sequence of the apo-dystrophin-4 cDNA (SEQ ID NO 1) 



Figure 1 



# # 



Inversion start site 
I 

TAAAGAAAGAATTATAAAGGAAAAAGAAAATAACGCAAT GGACAAGT GGT G 

850 + + + + + + 900 

ATTTCTTTCTTAATATTTCCTTTTTCTTTTATTGCGTTACCTGTTCACCAC (51) 
*RKNYKGKRK*RNGQVV 



AAGCT GT GAACT CAGGT GT GCACAAT TAT CAGGAACACCCCAAAACCAAAGT GAGGTAGA 

901 + + + + + + 960 

TTCGACACTTGAGTCCACACGTGTTAATAGTCCTTGTGGGGTTTTGGTTTCACTCCATCT (111) 
KL*TQVCTIIRNTPKPK*GR 



AATAGCAT GAGAAGC C GT GTT T GAT GT T AAT TAA TT 

961 -f + + 996 

T TAT C GT ACT C T T C GG CACAAACTACAATT AATT AA (14 7) 
NSMRSRV*C*LI 



The inversion sequence of the apo-dystrophin-4 cDNA plus a 10 base-pair region 5' to the 
start of the inversion sequence (SEQ ID NO 1A). 



Figure 1A 



Start at 710 
I 

AACAAT GGCAG 

+ + 720 

TTGTTACCGTC (11) 
Q W Q 



GTTT T ACACGT CT AT GCAAT T GT ACAAAAAAGT T AT AAGAAAACT ACAT GTAAAAT CTT G 

721 + + + + + + 780 

CAAAATGTGCAGATACGTTAACATGTTTTTTCAATATTCTTTTGATGTACATTTTAGAAC (71) 
VLHVYAIVQKSYKKTTCKIL 



AT AG CTAAAT AACT T GC CAT TT CTT TAT AT GGAAC G CAT TT T G GGT T GT T T AAAAATT T A 

TAT C GAT T TAT T GAAC G G T AAAGAAAT AT A C C T T G C GT AAAAC C C AA C AAAT T T T T AAAT (131) 
I A K * LAI SLYGTHFGLFKNL 
inversion start site 
I 

TAACAGTTATAAAGAAAGAATTATAAAGGAAAAAGAAAATAACGCAATGGACAAGTGGTG 

841 + + + + + + 900 

ATTGTCAATATTTCTTTCTTAATATTTCCTTTTTCTTTTATTGCGTTACCTGTTCACCAC (191) 
*QL*RKNYKGKRK*RNGQVV 



AAGC T GT GAACT CAGGT GT GCACAAT TAT CAGGAACAC C CCAAAAC CAAAGT GAGGTAGA 

901 + + + + + + 960 

TTCGACACTTGAGTCCACACGTGTTAATAGTCCTTGTGGGGTTTTGGTTTCACTCCATCT (251) 
KL*TQVCTI IRNTPKPK* GR 



AATAG CAT GAGAAG CCGTGTTT GAT G T T AAT TAA T T 

961 + + + 996 

TTAT CGTACT CTT CGGCACAAACTACAATTAATTAA (287) 
NSMRSRV*C*LI 



The inversion sequence of the apo-dystrophin-4 cDNA plus the upstream 150 bp 
from the start of the inversion at 860 to the Hpa I enzyme site (SEQ ID NO IB) 



Figure IB 



# • 



GTGGTTTGATTGATAGTAAAAAAAATGTTCGTTAATACAAGTAGAGAGTAAGTAATCAAT 

CAC CAAACTAAC TAT CATTT T TTTTACAAG CAAT TAT GTTCAT CT CT CAT T CATTAGT TA 
VV*LIVKKMFVNTSRE*VIN 



CAAT CACT CATAGCCAAGGT GGAAAAGAT GT AT C C CAT CAT GGAAT ATT C CT GTT CT GAT 

GTT AGT G AGT AT C GGTT C CAC CT TT T CT ACATAG GGT AGTAC CT T ATAAG GAC AAGAC T A 
QSLIAKVEKMYPIMEYSCSD 



AGAAAT C T T GT G CTT AT CT AT G GAATT CTTTT GAT ATAT ATTT ACATT G G GAAC CT GAAT 

TCT T TAGAACAC GAATAGATACCTTAAGAAAACTATATATAAAT GTAAC C CTT GGACTTA 
RNLVLIYGILLIYIYIGNLN 



GTAG CTT GACAT TTTT C CAT GTAAACAC CAGTAGCCT GAT C CAACATTAAGCT GATACTA 

181 + + + + + + 240 

CAT C GAACT GTAAAAAGGTACATTT GT GGT CAT CGGACTAG GTT GTAATTCGACTAT GAT 
VA*HFSM*TPVA*SNIKLIL 

ACAAACAACGTGTAATGGCTTCATTAATAAGGCTTTGCTTCTTCCTGGAAACTGGTGAAA 

241 + + + + + + 300 

TGTTTGTTGCACATTACCGAAGTAATTATTCCGAAACGAAGAAGGACCTTTGACCACTTT 
TNNV*WLH* * GFAS SWKLVK 



AATCAAACCTTGTTGTGTACACCCTCGATGCAGCTTCTGTGTTGTCTTCACCCAGAAATG 

301 + + -f + + + 360 

TTAGTTT G GAACAACACAT GTGGGAG CTACGT CGAAGACACAACAGAAGT GG GT CTTT AC 
NQTLLCTPSMQLLCCLHPEM 



The polynucleotide sequence of apo-dystrophin-4 (SEQ ID NO 2) 



Figure 2 



GGGAAT GAT T TCCCAAATGGCAAAGAAACAGAGTGAT GCT AT CT AT CT G CACCT T T T GT A 

361 + 4- + + + + 420 

CCCTTACTAAAGGGTTTACCGTTTCTTTGTCTCACTACGATAGATAGACGTGGAAAACAT 
GNDFPNGKETE*CYLSAPFV 



AAGTCTGTCTTTCTTTCTCTTTGTTTTCCAGGACACAATGTAGGAAGTCTTTTCCACATG 

421 + + + + + + 480 

TT CAGACAGAAAGAAAGAGAAACAAAAGGT C CT GT GT TACAT C CT T CAGAAAAGGT GTAC 
KSVFLSLCFPGHNVGSLFHM 



GCAGAT GATTT GGGCAGAGCGAT GGAGT CCT TAGTAT CAGT CAT GACAGAT GAAGAAGGA 

p 

481 + + 4- + + + 540 

J3 CGTCTACTAAACCCGTCTCGCTACCTCAGGAATCATAGTCAGTACTGTCTACTTCTTCCT 

% ADDLGRAMES LVSVMTDEEG 

fy 

GP GCAGAATAAATGTTTTACAACTCCTGATTCCCGCATGGTTTTTATAATATTCATACAACA 

*** 541 + + + + + + 600 

s 

fl C GT CTT ATTTACAAAAT GT T GAGGACTAAGGGC GTACCAAAAATAT TATAAGTAT GTT GT 

■raw 

^3 AE*MFYNS* FPHGFYNIHTT 

•krc 

Q AAGAGGATTAGACAGTAAGAGTT TACAAGAAATAAAT CTATAT TT T T GT GAAGGGT AGT G 

TT CT C CTAATCT GT CATT CT CAAAT GTT CTTTATTTAGATATAAAAACACT T C C CATCAC 
KRI R Q * EFTRNKSI FL* RVV 



GTATTATACTGTAGAT TT CAGTAGTTT CTAAGT CT GTTAT T GTT T T GT TAACAAT GGCAG 

CAT/VAT AT GAC AT CT AAAGT CAT CAAAGATTCAGACAAT AACAAAACAATT GTT ACCGTC 
VLYCRFQ* FLSLLLFC*QWQ 



Figure 2 (confd) 



GT TT T AC AC GT CTAT G C AAT T GT AC AAAAAAGTT AT AAGAAAAC T AC AT GT AAAAT C T T G 

721 + + + + + + 780 

CAAAAT GT GCAGATAC GT TAACATGTT T T TT CAATAT T C T T TT GAT GTACAT TTTAGAAC 
VLHVYAIVQKSYKKTTCKIL 

ATAGCTAAATAACTTGCCATTTCTTTATATGGAACGCATTTTGGGTTGTTTAAAAATTTA 

781 + + + + + + 840 

TAT C GAT T TAT T GAAC G GT AAAG AAAT AT AC C T T G C G T AAAAC C C AAC AAAT T T T T AAAT 
I A K * LAI SLYGTHFGLFKNL 

taacacttataaagaaagaJattataaaggaaaaagaaaataac G CAAT G GACAAGT GGT g 
841 + + + + + + 900 

attgtcaatatttctttcttaatatttcctttttcttttattgcgttacctgttcaccac 
* q l * rknykgkrk* rngqvv 

aagctgtgaactcaggtgtgcactvattatcaggaacaccccaaaaccaaagtgaggtaga 

901 + + + -f + + 960 

ttcgacacttgagtccacacgtgttaatagtccttgtggggttttggtttcactccatct 
kl*tqvctiirntpkpk*gr 

aatagcatgagaagccgtgtttgatgttaattaatt 

961 + + + 996 

TT AT CGT ACT CTT C GGCACAAACT AC AAT T AATT AA 
NSMRSRV*C*'LI 



Figure 2 (cont'd) 
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Figure 4A 
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COS apo-4 Transfectants 
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Figure 4C 



Figur 4D 




Figure 4E 




. . . TAGTT T C CTAT T CAAT GTATAGT GCACCAAAG GT CAATT CAAGAGTT TAT TAT TAT T 

-239 + + + + + + -180 

. . . AT CAAAGGATAAGTTACATAT CAC GT GGT T T C CAGTTAAGTT CT CAAATAATAATAA 
. *FPIQCIVHQRSIQEFIII 

ATTTT CAACCCAAGTAAAAGCAGAGAGAAAATAGC CACCT CCACCATAGCCTCAGAAGCA 

-179 + + + + + + -120 

TAAAAGTTGGGTTCATTTTCGTCTCTCTTTTATCGGTGGAGGTGGTATCGGAGTCTTCGT 
IFNPSKSREKIATSTIASEA 

AGCCAACAGCCTGAAACAGCTTTGAAATGAAAAGTTGGTGTGGCGGTGATGGTGGCAGTG 

TCGGTTGTCGGACTTTGTCGAAACTTTACTTTTCAACCACACCGCCACTACCACCGTCAC 
SQQPETALK* KVGVAVMVAV 

ATAAT GGT GAC CGAT GGTT GGGT GCT GGT GAT GGTAGT GGT AGTT GT GAAGGT GGT GAT G 

T ATTACCACT GGCTAC CAAC C CAC GAC CACTAC CATCACCAT CAACACT T C CAC CACTAC 
IMVTDGWVLVMVVVVVKVVM 

GT GGT TT GAT T GAT AGT AAAAAAAAT GT T C GT T AAT ACAAGT AGAGAGT AAGT AAT CAAT 

CACCAAACTAACTAT CATTTTTTTTACAAGCAATTAT GTT CAT CT CT CATT CATTAGTTA 
VV*LIVKKMFVNTSRE*VIN 

CAAT CACT CATAGC CAAGGT GGAAAAGAT GTAT CCCAT CAT GGAATATT C CT GTT CT GAT 

GT TAGT GAGTAT CGGTT CCACCT TTT CTACATAGGGTAGT ACCT TATAAGGACAAGACT A 
QSLIAKVEKMYPIMEYSCSD 

AGAAATCTTGTGCTTATCTATGGAATTCTTTTGATATATATTTACATTGGGAACCTGAAT 

121 + + + + + + 180 

T CT T T AGAACACGAATAGAT AC CTTAAGAAAACT AT ATATAAAT GTAAC C CT T GGACT T A 
RNLVLIYGILLIYIYIGNLN 

GTAGCTT GACATTTTT CCAT GTAAACAC CAGTAGCCT GAT C CAACAT T AAGC T GATACTA 

181 + + + + + + 240 

CATC GAACTGTAAAAAGGTACATTT GTGGTCATCGGACTAGGTTGTAATT CGACTATGAT 
VA*HFSM*TPVA*SNIKLIL 

ACAAACAACGTGTAATGGCTTCATTAATAAGGCTTTGCTTCTTCCTGGAAACTGGTGAAA 

241 + + + + + + 300 

T GT T T GT T GCACAT TAC CGAAGTAATTAT T C C GAAAC GAAGAAGGAC CT T T GACCACTTT 
TNNV*WLH* * GFAS SWKLVK 

AAT CAAAC CT T GT T GT GTACAC C CT C GAT GCAGCT T CT GT GTT GTCTT CACC CAGAAAT G 

301 ' + + + + + + 360 

T TAGTT T GGAACAACACAT GTGGGAGCTACGT CGAAGACACAACAGAAGT GGGTCTTTAC 
NQTLLCTPSMQLLCCLHPEM 

GGGAAT GAT T T C C CAAAT GGCAAAGAAACAGAGT GAT GCTATCTAT CT GCAC CTTTT GTA 

361 + + -f + + + 420 

C CCT TACTAAAGGGT T TAC C GT TT CTTT GT CT CACTAC GATAGATAGAC GTG GAAAACAT 
GNDFPNGKETE*CYLSAPFV 



Figure 6 



begin exon 79 
I 

AAGTCTGTCTTTCTTTCTCTTTGTTTTCC AGGA CACAATGTAGGAAGTCTTTTCCACATG 

421 + + + + + + 480 

TTCAGACAGAAAGAAAGAGAAACAAAAGGTCCT GT GTTACAT CCTTCAGAAAAGGT GTAC 
KSVFLSLCFPGHNVGSLFHM 



GCAGAT GATTT G GGCAGAGC GAT GGAGT C CTTAGTAT CAGT CAT GACAGAT GAAGAAGGA 

481 + + + + + + 540 

CGTCTACTAAACCCGTCTCGCTACCTCAGGAATCATAGTCAGTACTGTCTACTTCTTCCT 
ADDLGRAMES LVSVMTDEEG 

GCAGAATAAATGTTTTACAACTC CTGATT CCCGCAT GGTTTTTATAATATT CATACAACA 

C GT CT TAT T TACAAAAT GT T GAGGACT AAGG G C GTAC CAAAAAT ATTAT AAGT AT GT T GT 
AE*MFYNS* FPHGFYNIHTT 



{ N ) 

AAGAGGATTAGACAGTAAGAGTTTACAAGAAATAAATCTATATTTTTGTGAAGGGTAGTG 

601 + + + + + + 660 

TT CTCCTAATCT GTCATT CT CAAATGTTCTTTATTTAGATATAAAAACACTTCCCATCAC 
KRIRQ*EFTRNKSIFL*RVV 



GTATTATACTGTAGATTTCAGTAGTTTCTAAGTCTGTTATTGTTTTGTTAACAATGGCAG 

661 + + 4- + + + 720 

CATAATAT GACATCTAAAGT CATCAAAGATTCAGACAATAACAAAACAATTGTTACCGT C 
VLYCRFQ*FLSLLLFC*QWQ 

GTT TTACAC GT CTAT GCAAT T GTACAAAAAAGT TATAAGAAAACTACAT GTAAAAT CTT G 

CAAAAT GT GCAGAT ACGTTAACAT GTTTTTTCAATATTCTTTTGATGTACATTTTAGAAC 
VLHVYAIVQKSYKKTTCKIL 

ATAGCTAAATAACT T GC CATT T CT TTATAT GGAAC GCAT TT T GGGT T GTT TAAAAAT TTA 

781 + + + + + + 840 

TATCGATTTATTGAACGGTAAAGAAATATACCTTGCGTAAAACCCAACAAATTTTTAAAT 
IAK* LAI SLYGTHFGLFKNL 
inversion start site 
I 

TAACAGTTATAAAGAAAGAAT TATAAAGGAAAAAGAAAAT AACGCAAT GGACAAGT GGT G 

841 + + + + + + 900 

ATTGTCAATATTTCTTTCTTAATATTTCCTTTTTCTTTTATTGCGTTACCTGTTCACCAC 
*QL*RKNYKGKRK*RNGQVV 

AAGCT GT GAACTCAGGT GT GCACAATTAT CAGGAACACC CCAAAAC CAAAGT GAGGTAGA 

901 + + + + + + 960 

TTCGACACTTGAGTCCACACGTGTTAATAGTCCTTGTGGGGTTTTGGTTTCACTCCATCT 
KL*TQVCTI IRNTPKPK* GR 



AAT AG CAT GAGAAG CCGTGTTT GAT GT T AAT T AA T T 

961 + + + 996 

T TAT C GTACT CTT C GGCACAAACT ACAAT TAAT TAA 
NSMRSRV*C*LI 



Figure 6 (cont'd) 



451bp 



I 

apo-4cDNA exon79 of dystrophin, 



genomic DNA exon 79 of dystrophin 



1.62 Kb deletion .657 Kb deletion 




860bp 



137bp 
inversion 



860bp 



2.414Kb 
to end of 
dystrophin 
3.265Kb of 
3'UTR 



Figure 7 
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— o — 



Hind III on(/ . Hind III 
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exon 79 



■3' 



dystrophin genomic DNA 



5.9Kb 



Hind III 



7 8Kb J dystrophin YAC clone DNA 



Sail 



Hind III 



400bp 




J dystrophin phage clone DNA 

U34) 

apo-dystrophin cDNA 



inverted segment 
map location 

137bp 



1.62 Kb 



*cDNA map is not precisely drawn to scale 



Figure 8 
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Figure 9B 
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A. Human B. Mouse 



Figure 10A 



Figure 10B 



50 1 

Mgenl073 

Hapol234 ctagtttcct attcaatgta tagtgcacca aaggtcaatt caagagttta 

Consensus 

51 100 

Mgenl073 

Hapol234 ttattattat tttcaaccca agtaaaagca gagagaaaat agccacctcc 

Consensus 

101 begin GRAIL exon @149 150 

Mgenl073 ttcACAGgCT tAAgCAGCca gtAAATGAcA 

Hapol234 accatagcct cagaagcaag ccaACAGcCT gAAaCAGCtt tgAAATGAaA 
Consensus ACAG-CT -AA-CAGC AAATGA-A 

151 200 

Mgenl073 AtT T AtgtGgtAgt cAgGtcactG 

Hapol234 AgTtggtgtg gcggtgatgg tggcagtgaT AatgGtgAcc gAtGgttggG 
Consensus A-T T A G — A — -A-G G 

201 apo-4 S'end 250 

Mgenl073 TGCTGGTaAT GGTgaTctTA GcaGgcAgAG aaGGTGgTaG TGaTTTGATa 
Hapol234 TGCTGGTgAT GGTagTggTA GttGtgA.AG gtGGTGaTg G TG gTTTGATt 
Consensus TGCTGGT-AT GGT — T — TA G— G--A-AG --GGTG-T-G TG-TTTGAT- 

251 Ml 300 

Mgenl073 GtaAaagtgt AgAcTaTaCa acAgaAtAAa TAcAagtatA GTAA 

Hapol234 GatAgtaaaa Aa AaTg TtCg ttAatAcAAg TAgAgagtaA GTAAtcaatc 
Consensus G — A A-A-T-T-C- — A — A-AA- TA-A A GTAA 

301 M2 M3 350 

Mgenl073 ate caaCAAaGTG tgAAAGgTGT gTgCCATtAc acAtctTTCt 

Hapol234 aatcactcat agcCAAgGTG gaAAA GaTG T aTcCCATc At g gAataTTCc 
Consensus CAA-GTG — AAAG-TGT -T-CCAT-A A TTC- 

351 400 

Mgenl073 cG GtgATaagag cCTTgTCTAT GaAgTTC... TGAgATgTgT 

Hapol234 tGttctgata GaaATcttgt gCTTaTCTAT GgAaTTCttt TGAtATaTaT 
Consensus -G G — AT -CTT-TCTAT G-A-TTC TGA-AT-T-T 

401 450 

Mgenl073 TaggAagatG AAtCatcAat TtaCaT.... TTcTcCCcat cAAAtgaCAc 

Hapol234 TtacAttggG AAcCtgaAtg TagCtTgaca TTtTtCCatg tAAAcacCAg 

Consensus T A G AA-C A — T — C-T TT-T-CC -AAA CA- 

451 begin mouse GRAIL exon 500 

Mgenl073 cAtgCTGATC CAgtATTAAG CTaATACTAA C ACca tgcAatGCTT 

Hapol234 tAgcCTGATC CAacATTAAG CTgATACTAA CaaacaACgt gtaAtgGCTT 
Consensus -A--CTGATC CA — ATTAAG CT-ATACTAA C AC A — GCTT 

501 550 
Mgenl073 CATTAAcAAG GaTTTGCTTC TTgCT aGAAA tgGGT. .AAA AaCggACtgT 
Hapol234 CATTAAtAAG GcTTTGCTTC TTcCTgGAAA ctGGTgaAAA AtCaaACctT 
Consensus CATTAA-AAG G-TTTGCTTC TT-CT-GAAA — GGT — AAA A-C — AC — T 



551 600 
Mgenl073 GgTcTGTAtA CCtTCaATGC AGCTTaTGTG TTGTCTTttC C.tgAAatG 
Hapol234 GtTgTGTAcA CCcTCgATGC AGCTTcTGTG TTGTCTTcaC CcagaAAtgG 
Consensus G-T-TGTA-A CC-TC-ATGC AGCTT-TGTG TTGTCTT--C C AA — G 



Figure 11 



+ 



601 650 

Mgenl073 GtAA.TGA.cTc CCaAtAgtGg cAAccAgggG tacaATaCT TGCA 

Hapol234 GgAATGAtTt CCcAaAtgGc aAAgaAacaG agtgATgCTa tctatcTGCA 

Consensus G-AATGA-T- CC-A-A — G- -AA — A G AT-CT- TGCA 

651 exon79 700 

Mgenl073 CacTTTGTAA A CTCTT TCTTTCTCTT TGTTTTCCAG GACACAATGT 

Hapol234 CctTTTGTAA AgtctgTCTT TCTTTCTCTT TGTTTTCCAG GACACAATGT 

Consensus C — TTTGTAA A TCTT TCTTTCTCTT TGTTTTCCAG GACACAATGT 

701 750 

Mgenl073 AGGAAGcCT T TTCCACATGG CAGATGATTT GGGCAGAGCG ATGGAGTCCT 

Hapol234 AGGAAGtCTT TTCCACATGG CAGATGATTT GGGCAGAGCG ATGGAGTCCT 

Consensus AGGAAG-CTT TTCCACATGG CAGATGATTT GGGCAGAGCG ATGGAGTCCT 

751 800 

Mgenl073 TAGTtTCAGT CATGACAGAT GAAGAAGGAG CAGAATAAAT GTTTTACAAC 

Hapol234 TAGTaTCAGT CATGACAGAT GAAGAAGGAG CAGAATAAAT GTTTTACAAC 

Consensus TAGT-TCAGT CATGACAGAT GAAGAAGGAG CAGAATAAAT GTTTTACAAC 

801 850 
Mgenl073 TCCTGATTCC CGCATGGTTT TTATAATATT CgTACAACAA AGAGGATTAG 
Hapol234 TCCTGATTCC CGCATGGTTT TTATAATATT CaTACAACAA AGAGGATTAG 
Consensus TCCTGATTCC CGCATGGTTT TTATAATATT C-TACAACAA AGAGGATTAG 

851 900 
Mgenl073 ACAGTAAGAG TTTACAAGAA ATaAAATCTA TATTTTTGTG AAGGGTAGTG 
Hapol234 ACAGTAAGAG TTTACAAGAA AT.AAATCTA TATTTTTGTG AAGGGTAGTG 
Consensus ACAGTAAGAG TTTACAAGAA AT-AAATCTA TATTTTTGTG AAGGGTAGTG 

901 950 

Mgenl073 GTAcTATACT GTAGATTTCA GTAGTTTCTA AGTCTGTTAT TGTTTTGTTA 

Hapol234 GTAtTATACT GTAGATTTCA GTAGTTTCTA AGTCTGTTAT TGTTTTGTTA 

Consensus GTA-TATACT GTAGATTTCA GTAGTTTCTA AGTCTGTTAT TGTTTTGTTA 

951 1000 

Mgenl073 ACAATGGCAG GTTTTACACG TCTATGCAAT TGTACAAAAA AGTTAaAAGA 

Hapol234 ACAATGGCAG GTTTTACACG TCTATGCAAT TGTACAAAAA AGTTAtAAGA 

Consensus ACAATGGCAG GTTTTACACG TCTATGCAAT TGTACAAAAA AGTTA-AAGA 



Mgenl073 
Hapol234 
Consensus 



Mgenl073 
Hapol234 
Consensus 



1001 

AA. . . ACATG 
AAactACATG 
AA ACATG 

1051 

GGAACGCATT 
GGAACGCATT 
GGAACGCATT 



TAAAATCTTG 
TAAAATCTTG 
TAAAATCTTG 



TTGGGTTGTT 
TTGGGTTGTT 
TTGGGTTGTT 



1050 

ATAGCTAAAT AACTTGCCAT TTCTTTATAT 
ATAGCTAAAT AACTTGCCAT TTCTTTATAT 
ATAGCTAAAT AACTTGCCAT TTCTTTATAT 
begin inversion@1100 

1100 

TAAAAATTTA TAACAGTTAT AAAGAAAGAt 
TAAAAATTTA TAACAGTTAT AAAGAAAGAa 
TAAAAATTTA TAACAGTTAT AAAGAAAGA- 



1101 1150 
Mgenl073 TgtaAActaA Agtgtgcttt AtAAAAaAAg ttgtTtataA AaacccctAa 

Hapol234 TtatAAaggA A aa AgAAAAtAAc gcaaTggacA AgtggtgaAg 

Consensus T AA A A A-AAAA-AA- T A A A- 

1151 1200 
Mgenl073 acaaacACaC AcGcacaCAC AcacAcacac AcacaCaCAc AcaCAcAcTG 
Hapol234 ctgtgaACtC AgGtgtgCAC AattAtcagg AacacCcCAa AacCAaAgTG 
Consensus AC-C A-G CAC A A A C-CA- A — CA-A-TG 



1201 1243 
Mgenl073 AGGcAGcAca ttgtTttGcA ttacTtTagc gTGTatcaTA t.. 
Hapol234 AGGtAGaAat agcaTgaGaA gccgTgTttg aTGTtaatTA att 
Consensus AGG-AG-A-- T — G-A T-T -TGT TA 



Figure 11 (cont'd) 




Figure 12A 



-70 bp from 5' end of apo-4 
I 

Inr = GCCC TCAT TCTG GAGAC 
apo-4 = GCGG TGAT GGTG GCAGT - 48% perfect homology with Inr 

71% match on type of base 
(purine vs. pyrimidine) 



Figure 12B 
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The Apo-dystrophin cDNA 
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Figure 17A 
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The Apo-dystrophin cDN A 



Figure 17C 
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12/23bp spacer 

CACA GTG ACAAAAACC 

heptamer nonamer 



Figure 18A 



B. 

inversion breakpoint i 

11640 11650 11660 | 11670 11680 

* * * | * * 

dystrophin T TTATAACAGT TAT AAAGAAA GA A T T GT AAAC TAAAGTGTGC 
A AATATTGTCA ATATTTCTTT CT A AACATTTG ATTTCACACG 

a 

apo-4 cDNA 840 850 | 870 

[ 138 ] T TTATAACAGT TAT AAAGAAA GA A T T a TAAAg gAAAaaGaaa> 

A AAAAAAAAAA AAAAAAAAAA AA AA^AAAAy y A A A yjy A "y\/"y 

dystrophin T TTATAACAGT TATAAAGAAA GA A TTGTAAAC TAAAGTGTGC 



11690 11700 11710 11720 11730 

***** 

dystrophin TTTATAAAAA AAAGTTGTTT ATAAAAACCC CTAAAAACAA AACAAACACA 

AAATATTTTT TTTCAACAAA TATTTTTGGG GATTTTTGTT TTGTTTGTGT 

apo-4 cDNA 880 890 900 910 920 930 

[ 138 ] aTaAaAtggA cAAGTgGTga ATgtgAACtC aggtgtgCAc AAttAtCAgg> 

v A v A v A vvv A v AAAA v AA vv AA vvv AAA v A vvwvw AA v AA vv A v AA vv 

dystrophin TTTATAAAAA AAAGTTGTTT ATAAAAACCC CTAAAAACAA AACAAACACA 



11740 ■ 11750 
* * 

dystrophin CACACACACA CA TACACACA 
GTGTGTGTGT GTATGTGTGT 
apo-4 cDNA 940 950 
[ 138 ] aACAC-CcCA - Aa AC - CAa A> 

A* A\ A* A\ A\ Av A* A* A\ A\ A A A* 

dystrophin CACACACACA CATACACACA 



Figure 18B 



dystrophin 



inv rsion breakpoint2 

13130 13140 13150 13160 | 13170 

* * * * | * 

AATTAGCTTT TGGAGAGT GG GTTTTGTCCA T TATTAATAA TTAATTAATT 
TTAATCGAAA ACCTCTCACC CAAAACAGGT AATAATTATT AATTAATTAA 



apo-4 

dystrophin 



990 

_<AATTAATT 
AATTAATT 



13180 13190 13200 13210 13220 

***** 

dystrophin AACATCAAAC ACGGCTTCTC ATGCTATTTC TACCTCACTT TGGTTTTGGG 
T TGTAGTTTG TGC CGAAGAG TACGATAAAG ATGGAGTGAA ACCAAAACCC 

980 970 960 950 940 

apo-4 < AACATCAAAC ACGGCTTCTC ATGCTATTTC TACCTCACTT TGGTTTTGGG 



dystrophin AACATCAAAC ACGGCTTCTC ATGCTATTTC TACCTCACTT TGGTTTTGGG 



13230 13240 13250 13260 13270 

***** 

dystrophin GTGTTCCTGA TAATTGTGCA CACCTGAGTT CACAGCTTCA CCACTTGTCC 
CACAAGGACT ATTAACACGT GTGGACTCAA GTGTCGAAGT GGTGAACAGG 

930 920 910 900 890 

apo-4 <GTGTTCCTGA TAATTGTGCA CACCTGAGTT CACAGCTTCA CCACTTGTCC 



dystrophin GTGTTCCTGA TAATTGTGCA CACCTGAGTT CACAGCTTCA CCACTTGTCC 



13280 13290 13300 13310 13320 

***** 

dystrophin ATTGCGTTAT TTTCTTTTTC CTTTATAA TT CTTTCTTTT T CCTTCATAAT 
TAACGCAATA AAAGAAAAAG GA AATATTAA GAAAGAAAAA GGAAGTATTA 

I 

inversion breakpoints 

880 870 860 850 840 

apo-4 cATTGCGTTAT TTTCTTTTTC CTTTATAA TT CTTTCTTT aT aacTgtTAta 

~ * ~ - ~ ~ ~ ~ - ~ ~ ~ vw" w" " w 

dystrophin ATTGCGTTAT TTTCTTTTTC CTTTATAATT CTTTCTTTTT CCTTCATAAT 



Figure 18C 




Figure 18D 



inversion @ 860 



Figure 19 
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Figure 20A F 'g« re 208 
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Figure 23 




Figure 24 




! 

Figure 25B 



H2 starting at second methionine - 321 bp, predicted weight = 17.4Kd + 1 N-glycosylation 
site + 20.4 Kd. 



Figure 26A 



Splice sites for peptide 

^^TMEYSCSD RNLVLIYGILLIYmGNLNM KJCEONKCF^ITDSR^lVFIIFIQQRGLDSKSLQEINL 
YFCEGFYTSMQLYKKVIRKLHKITQWTRTPQNQSEVEIA 107 



Figure 26B 



Start 


Exon No. 


Exon 
Position 


Exon 
Length 


Intron 
No. 


Intron 
Position 


Intron 
Length 


@88 bp 


78.3 


@74-180 


106 bp 


79.1 


@181-529 


349 bp 




79.1 


@530-654 


125 bp 


79.4 


@655-720 


66 bp 




79.4 


@721-769 


49 bp 


79.55 


@770-875 


105 bp 




79.55 


@876-893 


18 bp 


79.75 


@894-932 


39 bp 




79.85 


@933- 
966 


33 bp 









Hydrophobicity Scale KD; Candidate membrane-spanning segments: 



Certain 1 12- 32 1.8833 



Figure 26C 



- Predicted TM structure 

> : Too long to be significative 
< : Too short to be significative 
LI : Loop length 
KR : Number of Lys and Arg 



KR Diff : Positive charge difference 

CE : Net charge energy 

CE Diff ; Net charge difference 

CH Diff : Charge difference over N-lerm segments 




kr Diff = 1 CYTOPLASM 
ch Diff = -3 OUTSIDE 
CE Diff = 0.54 OUTSIDE 



Structure no. 1 



Figure 26D 



A readthrough apo-4S product using the second available methionine 



The Apo-4S peptide sequence 

PI Begin TMi (R) 

+30 | P2 

MYPIMEYSCSD RNLVLIYGIL LIYIYIGNLN VA RHFSMKTP VARSNIKLIL 80 

TNNVKWLHKK GFASSWKLVK NQTLLCTPSM OLLCCLHPEM G NDFPNGKET 130 

P3 

ER CYLSAPFV KSVFLSLCFP GHNV GSLFHM ADDLGRAMES LVSVMTPEE G 180 



AEKMFYNSRF PHGFYNIHTT KRIRQKEFTR NKSIFL RRW VLYCRFOKFL 23 0 
SLLLFCKO WO VLHVYAIVQK SYK KTTCKIL IAKKLAISLY GTHFG LFKNL 280 
KQLKRKNYKG KRKKRNGQW KLRTQVCTII RNTPKPKRGR NSMRSRVRCK 330 
LI 332 (302aa in predicted polypeptide) 

Figure 27 A 



Candidate membrane-spanning segments: 

Certain 1 41-61 1.9073 
Putative 2 101-121 0.8052 
Certain 3 132-152 1.2552 
Putative 4 217-237 1.1833 
Putative 5 254-274 0.9240 



Transmembrane segments included in structure No. 8:1 2 3 4 5 

Loop lengths: 11 39 10 64 16 58; K+R profile: 1 2 5 (9 >22) 

K+R difference: -23: -> Orientation: N-out 

Charge-difference over N-terminal Membr. segs. (±15 residues): -4 
-> Orientation: N-out 

CYT-EXT profile (neg. values indicate cytoplasmic preference): < -0. 13 < 
CYT-EXT difference: 0.13: -> Orientation. N-out 



Figure 27B 



> : *'oo long to be significative KR Diff : Positive charge difference 

< : Too short to be significative CE : Net charge energy 

LI : Loop length CE Diff : Net charge difference 

KR : Number of Lys and Arg CH Diff : Charge difference over N-term segments 

CE=< CE = -0.13 CE=< 

~ KR = 9 KR=> KR = 22 

*y LL =39 LL =64 LL =58 

L LL = 11 LL = 10 LL = 16 

U KR = 1 KR = 2 KR = 5 

CE=< CE =< CE =< 

KR Diff = -23 OUTSIDE 

pi CHDiff= -4 OUTSIDE 

U CE Diff = 0 13 OUTSIDE Structure no. 8 



Figure 27C 




Figure 28 




Figure 28 (cont'd) 



Figure 29 



Figure 29 (cont'd) 




Figure 30 




Figure 31 
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Additional Oligonucleotide primers used for apo-dystrophin-4 
southern blotting and sequencing 



FORWARD 



GIT CGT TAA TAC AAG TAG 


F2.3(@28) 


(SEQIDNO 15) 


GCC AAG GTG GAA AAG ATG 


F2.2((2>73) 


(SEQ ID NO 16) 


CCA GTA GCC TGA TCC AAC 


F3.2(@208) 


(SEQ ID NO 17) 


GGC TTC ATT AAT AAG 


F3.1(@257) 


(SEQIDNO 18) 


GGC AAA GAA ACA GAG TG 


F4.2(@379) 


(SEQIDNO 19) 


CAG GAC ACA ATG TAG GA 


F4.1(@449) 


(SEQIDNO 20) 


GTT ATA AAG AAA GAA TTA TAA AG 


FJ n(@846) 


(SEQ ID NO 21) 


GAA AAT AAC GCA ATG GAC 


F5.1(@875) 


(SEQ ID NO 22) 


REVERSE 






GAT GGG ATA CAT CTT TTC C 


R6.1(@99) 


(SEQ ID NO 23) 


CAA GCT ACA TTC AGG TTC CC 


F2.2R(@188) 


(SEQ ID NO 24) 


GGA CTC CAT CGC TCT GCC 


R4.1(@510) 


(SEQ ID NO 25) 


GAC TTA GAA ACT ACT G 


R3.4(@694) 


(SEQ ID NO 26) 


ATA GAC GTG TAA AAC CTG C 


R2.1(@735) 


(SEQ ID NO 27) 


AAC TGT TAT AAA TTT TTA 


RSP2(@848) 


(SEQ ID NO 28) 


CTT TTT CCT TTA TAA TTC TTT C 


R2.3 0 (@875) 


(SEQ ID NO 29) 



Figure 34 



An Additional Splice Product Predicted From The Apo-4 Gene 

A second potential theoretical splice product which retains exon 78.3 is shown below. 

H2 pl-124 spliced product =351 bp, 117 amino acids + 10 from vector + 1 N-glycosylation 
site; predicted weight = 21.9 Kd 

Figure 35A 

Peptide Generated 

MFVNTTKVEKMYP1MEYSCSDRNLVLIYGILL1YIY1GNLNMKKEONKCFTTPDSRMVF11 
FIQQRGLDSKSLQEINLYFCEGFYTSMQLYKKVIRKLHKITQWTRTPQNQSEVEIA (1 17 
amino acids) (SEQ ID NO 30) 

Figure 35B 



Start 


Exon No. 


Exon 
Position 


Exon 
Length 


Intron 
No. 


Intron 
Position 


Intron 
Length 


@26 bp 


78.1 


@16-41 


26 bp 


78.3 


@42-74 


33 bp 




78.3 


@75-181 


106 bp 


79.1 


@182-530 


349 bp 




79.1 


@53 1-655 


125 bp 


79.4 


@656-721 


66 bp 




79.4 


@722-770 


49 bp 


79.55 


@77 1-876 


105 bp 




79.55 


@877-894 


18 bp 


79.75 


@895-933 


39 bp 




79.85 


@934- 
967 


33 bp 









Hydrophobicity Scale KD; Candidate membrane-spanning segments: 



Certain 1 22- 42 1.8833 

Figure 35C 



t'retlicUd I'M structure 

> : loo k*>g to be significative KR CHff : Positive cl>arge difference 

< : loo slwrt to be significative CE : Net charge energy 

U: Loop length CE Dltf : Net charge deference 

W? : Nutrtoer ot ly* and Ajg CH DWf : Charge difference over N-lerm segment* 




KRDiff- 3 CYTOPLASM 
cm an- 2 OUTSIDE 
CEDirf« 0.M OUTSIDE 



Structure no. 1 



Figure 35D 



Nucleic Acid Subsequence Sites Identified In Apo-4 



Motif 

CpG 

CAAT 

TATAAT (5/6) 
TATA 



Position 

-7, (+28, +106) 

-132, (+127, +131) 

-120, -114, (+10) 

-154 



CCATTCA -162, -131 

TATCAGT +12, (+25) 

TGGCTGCAAGCCCAA (10/14) -57, (+41) 
GTGATGG -140, -4, +11, +32 



Significance 
DNA methylation site 
Binding of CAAT factors 
TFTTD Binding site 
Binds UNA polymerase II 
and TFIID 
Cap Site I 
Cap Site U 

Binds CTF/NF-I protein 
Eucaryotic Transcription 
Initiation Site 



Figure 36 



Top Pred predicts 4-5 transmembrane domains for a full-length 
apo-4F product in which all the stop codons are suppressed. 

Protein sequence and position of predicted TM domains 

Begin TMi (R) 
PI I P2 

MFVNTSREKV INQSLIAKVE K MYPIMEYSCSD RNLVLIYGIL LIYIYIGNLN VAR HFSMK60 

□ TPVARSNIKL I LTNNVKWLH KKGFASSWKL VK NQTLLCTP SMQLLCCLHP EMG NDFPNGK 120 

tfl P3 

S ETE RCYLSAP FVKSVFLSLC FPGH NVGSLF HMADDLGRAM ESLVSVMTDE E GAEKMFYNS 1 8 0 

RFPHGFYNIH TTKRIRQKEF TRNKSIF LRR VWLYCRFQK FLSLLLFCK Q WQVLHVYAIV 240 

s QKSYK KTTCK ILIAKKLAIS LYGTHF GLFK NLKQLKRKNY KGKRKKRNGQ WKLRTQVCT 300 

^ IIRNTPKPKR GRNSMRSRVR CKLI (324 amino acids) (SEQ ID NO 31) 

o 

Hydrophobicity Scale KD 

Figure 37A 



Apo-4F : Candidate membrane-spanning segments: 



Certain 


1 


33- 


53 1.9073 


Putative 


2 


93- 


113 0.8052 


Certain 


3 


124- 


144 1.2552 


Putative 


4 


209- 


■229 1.1833 


Putative 


5 


246- 


■266 0.9240 



I. Transmembrane segments included in structure 8:1 2 3 4 5; Loop lengths: 32 39 10 64 
16 58 



Figure 37B 



K+R difference: -19; -> Orientation: N-out; Charge-difference over N-terminal Membr. segs. 
(±15 residues): -3; -> Orientation: N-out 

CYT-EXT profile (neg. values indicate cytoplasmic preference): < < < < -0. 13 < 

CYT-EXT difference: 0.13 
-> Orientation: N-out 

II. Transmembrane segments included in structure 7: 1 3 4 5; Loop lengths: 32 70 64 16 

S 58 

jjj K+R profile: 5 > 22 > 5; K+R difference: 22 -> Orientation: N-in 
«p Charge-difference over N-terminal Membr. segs. (+15 residues): -3; -> Orientation: N-out 
!H CYT-EXT profile (neg. values indicate cytoplasmic preference): < -0.13 < -0.26 < 
p; CYT-EXT difference: 0. 1 3 ; -> Orientation: N-out 

Figure 37B (cont'd) 



TopPred predicts a cytoplasmic N-terminus for four TM domains 



> : Too long lo be significative 
< : Too short to be significative 
LI : Loop length 
KR : Number of Lys and Arg 



KR Diff : Positive charge difference 

CE : Net charge energy 

CE CHff : Net charge difference 

CH Dlff : Charge difference over N-term segments 



CE^-0.26 CE = < 
KR=> KR = 5 

LL = 70 LL = 16 




LL = 32 
KR = 5 
CE=< 



LL = 64 
KR => 



KR=22 



CE=-0.13 CE=< 
KRDiff= 22 CYTOPLASM 

CHDiff= -3 OUTSIDE 
CEDiff= 0.13 OUTSIDE 



Structure no. 7 



Figure 37C 



Basic Features of a Transposon or Retrovirus 





250-600bp 



t 



mobile genes / virion 
gag | pol | env 



3' 



Target site l0-50bp 

5-10bp Inverted repeat 

direct repeat (jr) 
(DR) 



Figure 38A 



Structure of the apo-4 inversion element before rearrangement 




9bp DR 12bpIR 12bp IR 6bpIR9bpDR 



Figure 38B 
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• 



Deleted sequences 



*Exon 2 



5' 



Exon 3 




RNA transcript is promoted from cell sequences but enhanced 
and terminated by viral sequences. 



Figure 39A 



Deleted 1.62Kb 



^Exon 78.3 



Exon 79 



5' 



I inversion 



RNA transcript is promoted from cell sequences but enhanced 
and terminated by inversion sequences which may also 
activate suppressor tRNAs or reverse transcriptase activity to 
prevent the recognition of stop codons. Inverted repeats (IR) 
are present at both ends of the inversion, as they are in 
retroviruses and transposable elements. 



Figure 39B 



